Second-Order Word Embeddings from Nearest Neighbor Topological Features
نویسندگان
چکیده
We introduce second-order vector representations of words, induced from nearest neighborhood topological features in pre-trained contextual word embeddings. We then analyze the effects of using second-order embeddings as input features in two deep natural language processing models, for named entity recognition and recognizing textual entailment, as well as a linear model for paraphrase recognition. Surprisingly, we find that nearest neighbor information alone is sufficient to capture most of the performance benefits derived from using pre-trained word embeddings. Furthermore, second-order embeddings are able to handle highly heterogeneous data better than first-order representations, though at the cost of some specificity. Additionally, augmenting contextual embeddings with second-order information further improves model performance in some cases. Due to variance in the random initializations of word embeddings, utilizing nearest neighbor features from multiple first-order embedding samples can also contribute to downstream performance gains. Finally, we identify intriguing characteristics of second-order embedding spaces for further research, including much higher density and different semantic interpretations of cosine similarity.
منابع مشابه
Building Earth Mover's Distance on Bilingual Word Embeddings for Machine Translation
Following their monolingual counterparts, bilingual word embeddings are also on the rise. As a major application task, word translation has been relying on the nearest neighbor to connect embeddings cross-lingually. However, the nearest neighbor strategy suffers from its inherently local nature and fails to cope with variations in realistic bilingual word embeddings. Furthermore, it lacks a mec...
متن کاملLearning Label Embeddings for Nearest-Neighbor Multi-class Classification with an Application to Speech Recognition
We consider the problem of using nearest neighbor methods to provide a conditional probability estimate, P (y|a), when the number of labels y is large and the labels share some underlying structure. We propose a method for learning label embeddings (similar to error-correcting output codes (ECOCs)) to model the similarity between labels within a nearest neighbor framework. The learned ECOCs and...
متن کاملNontrivial Bloch oscillations in waveguide arrays with second-order coupling.
Under the influence of the next-nearest-neighbor interaction, we theoretically investigate the occurrence of Bloch oscillations in zigzag waveguide arrays. Because of the special topological configuration of the lattice itself, the second-order coupling (SOC) can be enhanced significantly and leads to the band alteration beyond the nearest-neighbor model, i.e., the offset of minimum value from ...
متن کاملFast Nearest Neighbor Preserving Embeddings
We show an analog to the Fast Johnson-Lindenstrauss Transform for Nearest Neighbor Preserving Embeddings in `2. These are sparse, randomized embeddings that preserve the (approximate) nearest neighbors. The dimensionality of the embedding space is bounded not by the size of the embedded set n, but by its doubling dimension λ. For most large real-world datasets this will mean a considerably lowe...
متن کاملیک روش دو مرحلهای برای بازشناسی کلمات دستنوشته فارسی به کمک بلوکبندی تطبیقی گرادیان تصویر
This paper presented a two step method for offline handwritten Farsi word recognition. In first step, in order to improve the recognition accuracy and speed, an algorithm proposed for initial eliminating lexicon entries unlikely to match the input image. For lexicon reduction, the words of lexicon are clustered using ISOCLUS and Hierarchal clustering algorithm. Clustering is based on the featur...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1705.08488 شماره
صفحات -
تاریخ انتشار 2017